Stock Volatility Analysis and Forecasting Using GARCH and EGARCH Models: A Comparative Study of Microsoft and Apple¶
by¶
Imonikhe Ayeni¶
Abstract¶
This project provides a comprehensive analysis of stock price volatility, focusing on two technology giants: Microsoft Corporation (MSFT) and Apple Inc. (AAPL). Initially, a comparative study of their historical closing prices and daily returns is conducted to understand and contrast their inherent "swinginess" or risk profiles. Following this comparative assessment, the project delves into advanced time series modeling, specifically applying Generalized Autoregressive Conditional Heteroskedasticity (GARCH) and Exponential GARCH (EGARCH) models to forecast Microsoft's stock volatility. The EGARCH model, by accounting for asymmetric responses to market shocks (the "leverage effect" where negative news has a disproportionately larger impact on volatility than positive news), is identified as the superior forecasting tool. A robust walk-forward validation strategy is employed to generate realistic, out-of-sample volatility predictions. The findings highlight the presence of volatility clustering and persistence in both stocks, with EGARCH effectively capturing the critical leverage effect in Microsoft's volatility. This research offers valuable insights for investors seeking to better understand and manage risk in dynamic financial markets.
Introduction¶
In today's fast-paced and interconnected financial markets, understanding and anticipating stock price movements are paramount. Beyond simply knowing if a stock goes up or down, investors, analysts, and policymakers are keenly interested in its volatility – a measure of how much its price is expected to fluctuate or "swing" over a given period. High volatility signifies greater uncertainty and risk, while low volatility suggests more stable price movements. Accurate volatility forecasting is therefore crucial for effective risk management, portfolio optimization, and derivatives pricing.
This project embarks on a detailed exploration of stock volatility by focusing on two of the world's most influential technology companies: Microsoft Corporation (MSFT) and Apple Inc. (AAPL). Both companies hold significant weight in global markets, and understanding their individual and comparative risk characteristics provides valuable insights into broader market dynamics.
We begin by conducting a comparative analysis of Microsoft's and Apple's historical stock performance to identify differences in their inherent volatility. Following this, the project transitions to a more in-depth investigation of Microsoft's stock volatility specifically. Here, we employ sophisticated econometric models, namely GARCH (Generalized Autoregressive Conditional Heteroskedasticity) and EGARCH (Exponential GARCH). These models are cornerstones of modern financial econometrics, designed to capture key empirical features of financial returns, such as volatility clustering (where large price changes tend to be followed by large price changes, and vice-versa) and the leverage effect (the observation that negative news often triggers a larger increase in volatility than positive news of the same magnitude).
Through this comprehensive analysis and forecasting endeavor, our aim is to provide clear, actionable insights into the risk profiles of these tech giants and to demonstrate the power of advanced quantitative techniques in navigating the complexities of stock market volatility for a non-technical audience.
Objectives¶
The specific objectives of this study are to:
Analyze Historical Stock Performance and Volatility (MSFT vs. AAPL):¶
- Examine and visualize the historical daily closing prices and percentage returns for both Microsoft and Apple stock.
- Compute and compare key descriptive statistics, including measures of central tendency and dispersion, to understand their basic return characteristics and overall "swinginess."
Compare Volatility Characteristics (MSFT vs. AAPL):¶
- Conduct a direct quantitative comparison of the historical volatility levels between Microsoft and Apple stock, determining which company's stock has exhibited higher overall "swinginess" over the observed period.
Model Microsoft's Volatility Dynamics using GARCH and EGARCH:¶
- Apply the GARCH(1,1) model to Microsoft's daily returns to capture volatility clustering and persistence, understanding how past squared returns and past volatility influence current volatility.
- Apply the EGARCH(1,1) model to Microsoft's daily returns to explicitly test for the "leverage effect," assessing whether negative price shocks induce a greater volatility response than positive shocks of similar magnitude.
- Compare the GARCH and EGARCH models using statistical criteria (Log-Likelihood, AIC, BIC) to determine the superior model for Microsoft's volatility dynamics.
Forecast Microsoft's Future Volatility:¶
- Utilize the best-fitting model (EGARCH) to generate out-of-sample forecasts for Microsoft's stock volatility over a defined future horizon.
- Employ a walk-forward validation strategy to simulate realistic forecasting scenarios, continually updating the model with new data to produce robust one-step-ahead predictions.
Communicate Insights to a Non-Technical Audience:¶
- Translate complex statistical findings and model interpretations into clear, accessible language, using visual aids (e.g., plots of returns versus predicted volatility bounds) to effectively convey practical implications for investors and market participants.
Methodology¶
This project employs a systematic, data-driven methodology to analyze and forecast stock volatility. The approach integrates data handling, exploratory analysis, econometric modeling, and robust forecasting techniques to achieve the stated objectives.
1. Data Collection and Preprocessing¶
Data Source: Historical daily stock price data for Microsoft Corporation (MSFT) and Apple Inc. (AAPL) were obtained from a reliable public financial data source, using a Python library to fetch data from Alpha Vantage.
Data Scope: The dataset spans a comprehensive period, ensuring sufficient observations for robust statistical modeling and analysis.
Data Cleaning:
- The raw data was meticulously cleaned to handle missing values, ensure consistent data types, and convert the date column into a proper datetime index.
- Column names were standardized for ease of access and readability (e.g., 'Adj Close' to 'adjusted_close', 'Open' to 'open').
Return Calculation: Daily logarithmic returns were calculated for both MSFT and AAPL using their adjusted closing prices. Logarithmic returns are preferred in financial modeling due to their desirable statistical properties, such as time additivity.
Formula: $R_t = \ln(P_t / P_{t-1})$, where $R_t$ is the return at time t, and $P_t$ is the adjusted closing price at time t.
2. Exploratory Data Analysis (EDA) and Comparative Analysis¶
Descriptive Statistics: Key descriptive statistics were computed for both MSFT and AAPL daily returns, including mean, standard deviation (as a measure of volatility), skewness, and kurtosis. This provided initial insights into their average performance and risk characteristics.
Time Series Plots:
- Plots of historical adjusted closing prices for both companies were generated to observe general trends and identify periods of significant price movements.
- Time series plots of daily returns were created to visually inspect for volatility clustering (periods of high volatility followed by more high volatility) and other stylized facts of financial time series.
Comparative Volatility Visualization: Visual tools (e.g., side-by-side plots of rolling volatility, comparative plots of returns) were used to directly compare the "swinginess" of Apple and Microsoft stock over the entire historical period.
3. Volatility Modeling with GARCH and EGARCH (Focus on Microsoft)¶
Model Framework: The arch Python library was utilized for implementing the GARCH family models.
GARCH(1,1) Model Specification:
- A standard GARCH(1,1) model was fitted to Microsoft's daily returns. This model estimates the conditional variance ($h_t$) as a function of a constant ($\omega$), the previous period's squared residual ($\epsilon_{t-1}^2$, the ARCH term, weighted by $\alpha_1$), and the previous period's conditional variance ($h_{t-1}$, the GARCH term, weighted by $\beta_1$).
- Equation: $h_t = \omega + \alpha_1 \epsilon_{t-1}^2 + \beta_1 h_{t-1}$
EGARCH(1,1) Model Specification:
- An EGARCH(1,1) model was fitted to Microsoft's daily returns, specifically including the asymmetric (leverage) term. This model captures the dynamics of the logarithm of the conditional variance ($\ln(h_t)$), allowing for a differential impact of positive and negative shocks.
- Equation: $\ln(h_t) = \omega + \alpha_1 \left(\frac{|\epsilon_{t-1}|}{\sqrt{h_{t-1}}} - E\left[\frac{|\epsilon_{t-1}|}{\sqrt{h_{t-1}}}\right]\right) + \gamma_1 \frac{\epsilon_{t-1}}{\sqrt{h_{t-1}}} + \beta_1 \ln(h_{t-1})$
- The parameter $\gamma_1$ (the leverage effect coefficient) was of particular interest, as its significance (and sign) indicates the presence and direction of asymmetry.
Model Estimation: Both models were estimated using Maximum Likelihood Estimation (MLE), a standard method for fitting ARCH-type models.
Model Evaluation and Selection: The estimated parameters' statistical significance (p-values) were examined. Model fit was compared using Log-Likelihood, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). The model with the highest Log-Likelihood and lowest AIC/BIC was selected as the preferred model for forecasting Microsoft's volatility.
4. Volatility Forecasting (for Microsoft)¶
Walk-Forward Validation: To evaluate the out-of-sample forecasting performance realistically, a walk-forward validation strategy was implemented over a designated test set (e.g., the last 20% of the data).
- In each step, the model was trained on all available data up to that point.
- A one-day-ahead forecast of volatility was generated.
- The training window then "walked forward" by one day, incorporating the new observation, and the process was repeated.
Forecast Extraction: Predicted conditional volatilities (standard deviations) were extracted from the model's forecasts.
Visualization of Forecasts:
- The generated walk-forward volatility forecasts were plotted against the actual historical returns for the test period.
- Predicted ±2 standard deviation bands were overlaid on the returns, providing a visual assessment of how well the model's forecasted "swinginess" contained the actual daily price movements.
Output Formatting: A custom function was developed to reformat the model's numerical forecasts into a clean, easy-to-read dictionary (JSON-like) structure, mapping future dates to their corresponding predicted volatility values.
!pip install arch
Requirement already satisfied: arch in c:\users\user\anaconda3\lib\site-packages (7.2.0) Requirement already satisfied: numpy>=1.22.3 in c:\users\user\anaconda3\lib\site-packages (from arch) (1.26.4) Requirement already satisfied: scipy>=1.8 in c:\users\user\anaconda3\lib\site-packages (from arch) (1.13.1) Requirement already satisfied: pandas>=1.4 in c:\users\user\anaconda3\lib\site-packages (from arch) (2.2.2) Requirement already satisfied: statsmodels>=0.12 in c:\users\user\anaconda3\lib\site-packages (from arch) (0.14.2) Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\user\anaconda3\lib\site-packages (from pandas>=1.4->arch) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in c:\users\user\anaconda3\lib\site-packages (from pandas>=1.4->arch) (2024.1) Requirement already satisfied: tzdata>=2022.7 in c:\users\user\anaconda3\lib\site-packages (from pandas>=1.4->arch) (2023.3) Requirement already satisfied: patsy>=0.5.6 in c:\users\user\anaconda3\lib\site-packages (from statsmodels>=0.12->arch) (0.5.6) Requirement already satisfied: packaging>=21.3 in c:\users\user\anaconda3\lib\site-packages (from statsmodels>=0.12->arch) (23.2) Requirement already satisfied: six in c:\users\user\anaconda3\lib\site-packages (from patsy>=0.5.6->statsmodels>=0.12->arch) (1.16.0)
import pandas as pd
import requests
import plotly.express as px
import numpy as np
import matplotlib.pyplot as plt
from arch import arch_model
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import requests
API_KEY = "7DX1UVYVHI*****" # Masked my API Key, get yours from https://www.alphavantage.co
def get_stock_data(symbol):
# Changed API function to TIME_SERIES_DAILY
url = f"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol={symbol}&outputsize=full&apikey={API_KEY}"
response = requests.get(url)
return response.json()
apple_data = get_stock_data("AAPL")
microsoft_data = get_stock_data("MSFT")
# Access data using the correct key - 'Time Series (Daily)'
#print(apple_data["Time Series (Daily)"]) # Example output
print(apple_data["Meta Data"])
{'1. Information': 'Daily Prices (open, high, low, close) and Volumes', '2. Symbol': 'AAPL', '3. Last Refreshed': '2025-06-09', '4. Output Size': 'Full size', '5. Time Zone': 'US/Eastern'}
# Assuming apple_data is your dictionary with stock data
df_apple = pd.DataFrame.from_dict(apple_data['Time Series (Daily)'], orient="index", dtype=float)
print("df_apple shape:", df_apple.shape)
print()
print(df_apple.info())
df_apple.head(10)
df_apple shape: (6440, 5) <class 'pandas.core.frame.DataFrame'> Index: 6440 entries, 2025-06-09 to 1999-11-01 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 1. open 6440 non-null float64 1 2. high 6440 non-null float64 2 3. low 6440 non-null float64 3 4. close 6440 non-null float64 4 5. volume 6440 non-null float64 dtypes: float64(5) memory usage: 301.9+ KB None
| 1. open | 2. high | 3. low | 4. close | 5. volume | |
|---|---|---|---|---|---|
| 2025-06-09 | 204.390 | 206.00 | 200.020 | 201.45 | 72862557.0 |
| 2025-06-06 | 203.000 | 205.70 | 202.050 | 203.92 | 46607693.0 |
| 2025-06-05 | 203.500 | 204.75 | 200.150 | 200.63 | 55221235.0 |
| 2025-06-04 | 202.910 | 206.24 | 202.100 | 202.82 | 43603985.0 |
| 2025-06-03 | 201.350 | 203.77 | 200.955 | 203.27 | 46381567.0 |
| 2025-06-02 | 200.280 | 202.13 | 200.120 | 201.70 | 35423294.0 |
| 2025-05-30 | 199.370 | 201.96 | 196.780 | 200.85 | 70819942.0 |
| 2025-05-29 | 203.575 | 203.81 | 198.510 | 199.95 | 51477938.0 |
| 2025-05-28 | 200.590 | 202.73 | 199.900 | 200.42 | 45339678.0 |
| 2025-05-27 | 198.300 | 200.74 | 197.430 | 200.21 | 56288475.0 |
df_microsoft = pd.DataFrame.from_dict(microsoft_data['Time Series (Daily)'], orient="index", dtype=float)
print("df_microsoft:", df_microsoft.shape)
print()
print(df_microsoft.info())
df_microsoft.head(10)
df_microsoft: (6440, 5) <class 'pandas.core.frame.DataFrame'> Index: 6440 entries, 2025-06-09 to 1999-11-01 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 1. open 6440 non-null float64 1 2. high 6440 non-null float64 2 3. low 6440 non-null float64 3 4. close 6440 non-null float64 4 5. volume 6440 non-null float64 dtypes: float64(5) memory usage: 301.9+ KB None
| 1. open | 2. high | 3. low | 4. close | 5. volume | |
|---|---|---|---|---|---|
| 2025-06-09 | 469.700 | 473.430 | 468.6200 | 472.75 | 16469932.0 |
| 2025-06-06 | 470.085 | 473.335 | 468.7800 | 470.38 | 15285624.0 |
| 2025-06-05 | 464.955 | 469.650 | 464.0300 | 467.68 | 20154460.0 |
| 2025-06-04 | 464.000 | 465.690 | 463.0201 | 463.87 | 14162688.0 |
| 2025-06-03 | 461.470 | 464.140 | 460.8622 | 462.97 | 15743760.0 |
| 2025-06-02 | 457.140 | 462.110 | 456.8900 | 461.97 | 16626495.0 |
| 2025-05-30 | 459.715 | 461.680 | 455.5400 | 460.36 | 34770475.0 |
| 2025-05-29 | 461.550 | 461.720 | 455.3105 | 458.68 | 13982211.0 |
| 2025-05-28 | 461.220 | 462.520 | 456.9300 | 457.36 | 17086261.0 |
| 2025-05-27 | 456.480 | 460.950 | 456.1150 | 460.69 | 20974293.0 |
Data Cleaning and Preprocessing¶
I am going to create a function that cleans our stock data, the function:
- Converting index to DatetimeIndex named 'date'
- Removing numbering from column names
- Converting values to float
def clean_stock_data(df):
# Rename index to "date" and convert to DatetimeIndex
df.index = pd.to_datetime(df.index, format="%Y-%m-%d")
df.index.name = "date"
# Remove numbering from column names (e.g., "1. open" → "open")
df.columns = [col.split(" ")[1] for col in df.columns]
# Convert data to float
df = df.astype(float)
return df
cleaned_apple_df = clean_stock_data(df_apple)
cleaned_microsoft_df = clean_stock_data(df_microsoft)
print(cleaned_apple_df.info())
cleaned_apple_df.head()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 6440 entries, 2025-06-09 to 1999-11-01 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 open 6440 non-null float64 1 high 6440 non-null float64 2 low 6440 non-null float64 3 close 6440 non-null float64 4 volume 6440 non-null float64 dtypes: float64(5) memory usage: 301.9 KB None
| open | high | low | close | volume | |
|---|---|---|---|---|---|
| date | |||||
| 2025-06-09 | 204.39 | 206.00 | 200.020 | 201.45 | 72862557.0 |
| 2025-06-06 | 203.00 | 205.70 | 202.050 | 203.92 | 46607693.0 |
| 2025-06-05 | 203.50 | 204.75 | 200.150 | 200.63 | 55221235.0 |
| 2025-06-04 | 202.91 | 206.24 | 202.100 | 202.82 | 43603985.0 |
| 2025-06-03 | 201.35 | 203.77 | 200.955 | 203.27 | 46381567.0 |
print(cleaned_microsoft_df.info())
cleaned_microsoft_df.head()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 6440 entries, 2025-06-09 to 1999-11-01 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 open 6440 non-null float64 1 high 6440 non-null float64 2 low 6440 non-null float64 3 close 6440 non-null float64 4 volume 6440 non-null float64 dtypes: float64(5) memory usage: 301.9 KB None
| open | high | low | close | volume | |
|---|---|---|---|---|---|
| date | |||||
| 2025-06-09 | 469.700 | 473.430 | 468.6200 | 472.75 | 16469932.0 |
| 2025-06-06 | 470.085 | 473.335 | 468.7800 | 470.38 | 15285624.0 |
| 2025-06-05 | 464.955 | 469.650 | 464.0300 | 467.68 | 20154460.0 |
| 2025-06-04 | 464.000 | 465.690 | 463.0201 | 463.87 | 14162688.0 |
| 2025-06-03 | 461.470 | 464.140 | 460.8622 | 462.97 | 15743760.0 |
App Incorporation¶
fig = px.line(cleaned_apple_df, x=cleaned_apple_df.index, y="close",
title="Apple Stock Closing Price Over Time",
labels={"close": "Closing Price", "index": "Date"},
line_shape="linear")
# Customize line color
fig.update_traces(line=dict(color="orange"))
# Show the interactive chart
fig.show()
fig = px.line(cleaned_microsoft_df,
x=cleaned_microsoft_df.index,
y="close",
title="Microsoft Stock Closing Price Over Time",
labels={"close": "Closing Price", "index": "Date"},
line_shape="linear")
# Customize line color
fig.update_traces(line=dict(color="blue"), name="Microsoft Closing Price")
# Show the interactive chart
fig.show()
# Combine both dataframes (assuming they have the same date index)
combined_df = cleaned_apple_df[["close"]].rename(columns={"close": "Apple"})
combined_df["Microsoft"] = cleaned_microsoft_df["close"]
# Convert index to column for Plotly
combined_df = combined_df.reset_index().rename(columns={"index": "Date"})
combined_df
| date | Apple | Microsoft | |
|---|---|---|---|
| 0 | 2025-06-09 | 201.45 | 472.75 |
| 1 | 2025-06-06 | 203.92 | 470.38 |
| 2 | 2025-06-05 | 200.63 | 467.68 |
| 3 | 2025-06-04 | 202.82 | 463.87 |
| 4 | 2025-06-03 | 203.27 | 462.97 |
| ... | ... | ... | ... |
| 6435 | 1999-11-05 | 88.31 | 91.56 |
| 6436 | 1999-11-04 | 83.62 | 91.75 |
| 6437 | 1999-11-03 | 81.50 | 92.00 |
| 6438 | 1999-11-02 | 80.25 | 92.56 |
| 6439 | 1999-11-01 | 77.62 | 92.37 |
6440 rows × 3 columns
# Create an interactive line chart
fig = px.line(combined_df, x="date", y=["Apple", "Microsoft"],
title="Apple vs Microsoft Closing Prices",
labels={"value": "Closing Price", "Date": "Date", "variable": "Stock"},
line_shape="linear")
# Customize colors
fig.update_traces(marker=dict(size=3))
# Show interactive chart
fig.show()
Analysis of the Apple vs Microsoft Closing Prices Chart¶
These charts compare Apple (AAPL) and Microsoft (MSFT) stock prices over time. Here are my key observations:
- Early 2000s – Microsoft Dominance
Microsoft (red) was relatively stable but slightly declining. Apple (blue) was at much lower levels, struggling in the early 2000s before making a comeback. 2. Mid-2000s to 2012 – Apple’s Surge Around 2007–2012, Apple experienced a massive rally, likely due to: The launch of the iPhone in 2007. Increasing popularity of Macs and iPads. Strong revenue growth and innovation. Microsoft remained relatively flat during this period. 3. 2014–2016 – Apple's Drop There is a sharp dip in Apple’s stock price. Possible explanations: Stock split or price adjustment. iPhone sales slowdown in some quarters. Market corrections or investor reactions to tech industry trends. 4. Post-2016 – Microsoft’s Comeback Microsoft rebounded strongly around 2016, likely due to: Transition to cloud computing (Azure became a major revenue driver). Subscription models for Office 365. Satya Nadella's leadership pivoting towards AI and enterprise solutions. 5. Recent Years – Volatility & Growth Apple and Microsoft both show strong growth post-2020. Apple had some major corrections but continued climbing. Microsoft kept steadier growth, with cloud and AI playing key roles. COVID-19 market effects (2020) caused spikes and dips in both stocks. Final Insight Apple had a more explosive growth phase due to product innovation. Microsoft showed a more stable rise, driven by software and cloud computing.
Return on Investment¶
Looking at the plots sofar, we might want to conclude that Microsoft post 2020 is a "better" stock than Apple Corporation because its daily closing price is higher. But price is just one factor that an investor must consider when creating an investment strategy.
One way in which investors compare stocks is by looking at their returns instead. A return is the change in value in an investment, represented as a percentage. So let's look at the daily returns for our two stocks.
# Sort DataFrame ascending by date
cleaned_apple_df.sort_index(ascending=True, inplace=True)
# Create "return" column
cleaned_apple_df["return"] = cleaned_apple_df["close"].pct_change()*100
print("cleaned_apple_df:", cleaned_apple_df.shape)
print(cleaned_apple_df.info())
cleaned_apple_df.head()
cleaned_apple_df: (6440, 6) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 6440 entries, 1999-11-01 to 2025-06-09 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 open 6440 non-null float64 1 high 6440 non-null float64 2 low 6440 non-null float64 3 close 6440 non-null float64 4 volume 6440 non-null float64 5 return 6439 non-null float64 dtypes: float64(6) memory usage: 352.2 KB None
| open | high | low | close | volume | return | |
|---|---|---|---|---|---|---|
| date | ||||||
| 1999-11-01 | 80.00 | 80.69 | 77.37 | 77.62 | 2487300.0 | NaN |
| 1999-11-02 | 78.00 | 81.69 | 77.31 | 80.25 | 3564600.0 | 3.388302 |
| 1999-11-03 | 81.62 | 83.25 | 81.00 | 81.50 | 2932700.0 | 1.557632 |
| 1999-11-04 | 82.06 | 85.37 | 80.62 | 83.62 | 3384700.0 | 2.601227 |
| 1999-11-05 | 84.62 | 88.37 | 84.00 | 88.31 | 3721500.0 | 5.608706 |
# Sort DataFrame ascending by date
cleaned_microsoft_df.sort_index(ascending=True, inplace=True)
# Create "return" column
cleaned_microsoft_df["return"] = cleaned_microsoft_df["close"].pct_change()*100
print("cleaned_microsoft_df:", cleaned_microsoft_df.shape)
print(cleaned_microsoft_df.info())
cleaned_microsoft_df.head()
cleaned_microsoft_df: (6440, 6) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 6440 entries, 1999-11-01 to 2025-06-09 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 open 6440 non-null float64 1 high 6440 non-null float64 2 low 6440 non-null float64 3 close 6440 non-null float64 4 volume 6440 non-null float64 5 return 6439 non-null float64 dtypes: float64(6) memory usage: 352.2 KB None
| open | high | low | close | volume | return | |
|---|---|---|---|---|---|---|
| date | ||||||
| 1999-11-01 | 93.25 | 94.19 | 92.12 | 92.37 | 26630600.0 | NaN |
| 1999-11-02 | 92.75 | 94.50 | 91.94 | 92.56 | 23174500.0 | 0.205694 |
| 1999-11-03 | 92.94 | 93.50 | 91.50 | 92.00 | 22258500.0 | -0.605013 |
| 1999-11-04 | 92.31 | 92.75 | 90.31 | 91.75 | 27119700.0 | -0.271739 |
| 1999-11-05 | 91.81 | 92.87 | 90.50 | 91.56 | 35083700.0 | -0.207084 |
df_stock = pd.concat([
cleaned_apple_df.assign(stock="Apple"),
cleaned_microsoft_df.assign(stock="Microsoft")])
df_stock
| open | high | low | close | volume | return | stock | |
|---|---|---|---|---|---|---|---|
| date | |||||||
| 1999-11-01 | 80.000 | 80.690 | 77.3700 | 77.62 | 2487300.0 | NaN | Apple |
| 1999-11-02 | 78.000 | 81.690 | 77.3100 | 80.25 | 3564600.0 | 3.388302 | Apple |
| 1999-11-03 | 81.620 | 83.250 | 81.0000 | 81.50 | 2932700.0 | 1.557632 | Apple |
| 1999-11-04 | 82.060 | 85.370 | 80.6200 | 83.62 | 3384700.0 | 2.601227 | Apple |
| 1999-11-05 | 84.620 | 88.370 | 84.0000 | 88.31 | 3721500.0 | 5.608706 | Apple |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 2025-06-03 | 461.470 | 464.140 | 460.8622 | 462.97 | 15743760.0 | 0.216464 | Microsoft |
| 2025-06-04 | 464.000 | 465.690 | 463.0201 | 463.87 | 14162688.0 | 0.194397 | Microsoft |
| 2025-06-05 | 464.955 | 469.650 | 464.0300 | 467.68 | 20154460.0 | 0.821351 | Microsoft |
| 2025-06-06 | 470.085 | 473.335 | 468.7800 | 470.38 | 15285624.0 | 0.577318 | Microsoft |
| 2025-06-09 | 469.700 | 473.430 | 468.6200 | 472.75 | 16469932.0 | 0.503848 | Microsoft |
12880 rows × 7 columns
df_stock = df_stock.reset_index()
df_stock
| date | open | high | low | close | volume | return | stock | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1999-11-01 | 80.000 | 80.690 | 77.3700 | 77.62 | 2487300.0 | NaN | Apple |
| 1 | 1999-11-02 | 78.000 | 81.690 | 77.3100 | 80.25 | 3564600.0 | 3.388302 | Apple |
| 2 | 1999-11-03 | 81.620 | 83.250 | 81.0000 | 81.50 | 2932700.0 | 1.557632 | Apple |
| 3 | 1999-11-04 | 82.060 | 85.370 | 80.6200 | 83.62 | 3384700.0 | 2.601227 | Apple |
| 4 | 1999-11-05 | 84.620 | 88.370 | 84.0000 | 88.31 | 3721500.0 | 5.608706 | Apple |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 12875 | 2025-06-03 | 461.470 | 464.140 | 460.8622 | 462.97 | 15743760.0 | 0.216464 | Microsoft |
| 12876 | 2025-06-04 | 464.000 | 465.690 | 463.0201 | 463.87 | 14162688.0 | 0.194397 | Microsoft |
| 12877 | 2025-06-05 | 464.955 | 469.650 | 464.0300 | 467.68 | 20154460.0 | 0.821351 | Microsoft |
| 12878 | 2025-06-06 | 470.085 | 473.335 | 468.7800 | 470.38 | 15285624.0 | 0.577318 | Microsoft |
| 12879 | 2025-06-09 | 469.700 | 473.430 | 468.6200 | 472.75 | 16469932.0 | 0.503848 | Microsoft |
12880 rows × 8 columns
fig = px.histogram(df_stock, x="return", color="stock", nbins=200,
title="Distribution of Daily Returns for Apple and Microsoft",
labels={"return": "Daily Return", "stock": "Stock"},
barmode="group") # Overlay bars for better comparison
# Show the interactive plot
fig.show()
Daily Returns of Apple vs Microsoft¶
# Plot using Plotly Express
fig = px.line(
df_stock,
x="date",
y="return",
color="stock",
title="Daily Volatility of Returns of Apple vs Microsoft",
labels={"return": "Daily Return", "date": "Date"}
)
# Show plot
fig.show()
The chart above represents the Daily Returns of Apple vs. Microsoft over time,
where:
- Apple's daily returns are plotted in blue.
- Microsoft's daily returns are plotted in red.
Key Observations:
Volatility Over Time:¶
The early years (before 2010) show higher fluctuations in daily returns for both Apple and Microsoft. After 2010, volatility appears to stabilize, but large negative spikes are still present occasionally.
Extreme Negative Returns:¶
There are multiple sharp drops in daily returns especially for Apple, indicating possible market crashes or company-specific bad news. The most significant spikes occur June 2014 (Financial Crisis), August 2020 (COVID-19 Pandemic).
Comparing Apple & Microsoft:¶
Both stocks exhibit similar overall trends, suggesting they respond to market-wide events similarly. Apple (blue) seems to have more extreme negative returns than Microsoft (red), especially post-2010.
Mean Reversion:¶
The daily returns oscillate around zero, meaning that in the long run, both stocks have periods of gains and losses. However, consistent small positive returns over time would contribute to an upward trend in stock price; thus giving microsoft the edge among investors. Click here
Daily Volatility of Stock Prices¶
Let's start by measuring the daily volatility of our two stocks. Since our data frequency is also daily, this will be exactly the same as calculating the standard deviation.
# apple_daily_volatility = cleaned_apple_df["return"].std()
# microsoft_daily_volatility = cleaned_microsoft_df['return'].std()
# print("apple_daily_volatility:", apple_daily_volatility)
# print("microsoft_daily_volatility:", microsoft_daily_volatility)
volatility_df = df_stock.groupby("stock")["return"].std().reset_index()
volatility_df.columns = ["Stock", "Daily Volatility"]
volatility_df
| Stock | Daily Volatility | |
|---|---|---|
| 0 | Apple | 2.950596 |
| 1 | Microsoft | 2.003329 |
# Plot with Plotly Express
fig = px.bar(volatility_df,
x="Stock",
y="Daily Volatility",
title="Daily Volatility of Apple vs Microsoft",
text_auto=".2f",
color="Stock",
labels={"Daily Volatility": "Volatility (Std Dev)"})
fig.show()
Looks like Apple is more volatile than Microsoft. This reinforces what we saw in our time series plot.
While daily volatility is useful, investors are also interested in volatility over other time periods — like annual volatility. Keep in mind that a year isn't 365 days for a stock market, though. After excluding weekends and holidays, most markets have only 252 trading days.
volatility_df["Annual Volatility"] = volatility_df["Daily Volatility"] * np.sqrt(252)
# Plot annual volatility
fig = px.bar(volatility_df,
x="Stock",
y="Annual Volatility",
title="Annual Volatility of Apple vs Microsoft",
text_auto=".2f",
color="Stock",
labels={"Annual Volatility": "Volatility (Annual Std Dev)"})
fig.show()
Again, Apple has higher volatility than Microsoft.
microsoft_rolling_50d_volatility = cleaned_microsoft_df.rolling(window=50).std().dropna()
print("rolling_50d_volatility type:"), type(microsoft_rolling_50d_volatility)
print("rolling_50d_volatility shape:", microsoft_rolling_50d_volatility.shape)
microsoft_rolling_50d_volatility.head()
rolling_50d_volatility type: rolling_50d_volatility shape: (6390, 6)
| open | high | low | close | volume | return | |
|---|---|---|---|---|---|---|
| date | ||||||
| 2000-01-12 | 11.987091 | 12.156538 | 11.535577 | 11.903356 | 1.936607e+07 | 2.370743 |
| 2000-01-13 | 11.959993 | 12.148049 | 11.504730 | 11.896689 | 1.938532e+07 | 2.381335 |
| 2000-01-14 | 11.957029 | 12.197957 | 11.500824 | 11.942805 | 1.935144e+07 | 2.436428 |
| 2000-01-18 | 12.008028 | 12.273992 | 11.572642 | 12.034926 | 1.937931e+07 | 2.455687 |
| 2000-01-19 | 12.015926 | 12.244706 | 11.538265 | 11.973424 | 1.951835e+07 | 2.684886 |
# Create a DataFrame for plotting
df_plot = cleaned_microsoft_df.copy()
df_plot["50-day rolling volatility"] = microsoft_rolling_50d_volatility["return"]
# Plot using Plotly Express
fig = px.line(df_plot, x=df_plot.index, y=["return", "50-day rolling volatility"],
title="Microsoft Daily Return and 50-day Rolling Volatility",
labels={"value": "Return / Volatility", "index": "Date", "variable": "Metric"},
color_discrete_map={"return": "blue", "50-day rolling volatility": "red"}) # Customize colors
# Show interactive chart
fig.show()
- Daily Returns (Blue Line)
This line shows the percentage change in Microsoft’s stock price from one day to the next. It fluctuates around zero, meaning the stock experiences both gains and losses on different days. The spikes (both positive and negative) indicate periods of high volatility, where the stock had significant price swings. 2. 50-Day Rolling Volatility (Red Line) This is a smoothed measure of how much daily returns are fluctuating over time. It’s calculated as the standard deviation of daily returns over the last 50 days, helping to identify trends in volatility. When this line rises, it signals increased market uncertainty or major price swings. When it falls, it suggests that the stock is experiencing more stable movements. Key Insights from the Plot Periods of High Volatility: Noticeable spikes in the red line coincide with sharp movements in daily returns (blue line). This could be due to earnings reports, macroeconomic events, or global market shocks. Calm Periods: When the red line trends downward, the daily return fluctuations become smaller, indicating a more stable stock. Market Crashes or Events: If you observe sharp spikes in both lines, it could indicate a major financial event (like the 2008 crash, COVID-19, or company-specific news).
If you're an investor, high volatility means higher risk but also potential for higher rewards. If you're a risk-averse trader, you may prefer investing during low-volatility periods.
Here we can see that volatility goes up when the returns change drastically — either up or down. For instance, we can see a big increase in volatility in May 2020, when there were several days of large negative returns. We can also see volatility go down in August 2022, when there are only small day-to-day changes in returns.
This plot reveals a problem. We want to use returns to see if high volatility on one day is associated with high volatility on the following day. But high volatility is caused by large changes in returns, which can be either positive or negative. How can we assess negative and positive numbers together without them canceling each other out? One solution is to take the absolute value of the numbers, which is what we do to calculate performance metrics like mean absolute error. The other solution, which is more common in this context, is to square all the values.
cleaned_microsoft_df["squared_return"] = cleaned_microsoft_df["return"] ** 2
# Create the time series plot
fig = px.line(
cleaned_microsoft_df,
x=cleaned_microsoft_df.index,
y="squared_return",
title="Time Series of Squared Returns",
labels={"squared_return": "Squared Return", "index": "Date"},
line_shape="linear"
)
# Customize line color
fig.update_traces(line=dict(color="blue"))
# Show the interactive plot
fig.show()
MICROSOFT STOCK PRICE VOLATILITY PREDICTION USING THE GARCH MODEL AND EGARCH MODELS¶
A GARCH (Generalized Autoregressive Conditional Heteroskedasticity) model is used to model time-varying volatility in financial data. It extends the ARCH model by incorporating both past volatility (q parameter) and past forecasted variances (p parameter) to better capture volatility clustering.
The p parameter represents the influence of past conditional variances (similar to the autoregressive term in ARMA models). The p parameter handles correlations at prior time steps The q parameter accounts for past squared returns or "shock" events, modeling how unexpected market movements affect future volatility.The q parameter is used for dealing with "shock" events. It also uses the notion of lag. To see how many lags we should have in our model, we should create an ACF and PACF plot — but using the squared returns.
Lagged Value with ACT AND PACT¶
GARCH models also rely on the concept of lagged values to capture volatility persistence. To determine the appropriate number of lags (p, q) in our model, we should examine Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots—but applied to squared returns instead of raw returns. This helps identify how past volatility influences future volatility, as squared returns highlight volatility clustering more effectively.
fig, ax = plt.subplots(figsize=(15, 6))
# Plot ACF of squared returns
plot_acf(cleaned_microsoft_df['squared_return'].dropna(), ax=ax)
plt.xlabel("Lag [days]")
plt.ylabel("Correlation Coefficient")
plt.title("Autocorrelation Function (ACF) of Squared Returns for Microsoft")
plt.show()
fig, ax = plt.subplots(figsize=(15, 6))
# Plot ACF of squared returns
plot_pacf(cleaned_microsoft_df['squared_return'].dropna(), ax=ax)
plt.xlabel("Lag [days]")
plt.ylabel("Correlation Coefficient")
plt.title("Autocorrelation Function (ACF) of Squared Returns for Microsoft")
plt.show()
In our PACF, it looks like a lag of 1 would be a good starting point.
Normally, at this point in the model building process, we would split our data into training and test sets, and then set a baseline. Not this time. This is because our model's input and its output are two different measurements. We'll use returns to train our model, but we want it to predict volatility. If we created a test set, it wouldn't give us the "true values" that we'd need to assess our model's performance.
SPLIT DATASET INTO TRAINING SET¶
cutoff_test = int(len(cleaned_microsoft_df)*0.8)
microsoft_train = cleaned_microsoft_df.iloc[:cutoff_test]
print("microsoft_train type:", type(microsoft_train))
print("microsoft_train shape:", microsoft_train.shape)
microsoft_train.tail()
microsoft_train type: <class 'pandas.core.frame.DataFrame'> microsoft_train shape: (5152, 7)
| open | high | low | close | volume | return | squared_return | |
|---|---|---|---|---|---|---|---|
| date | |||||||
| 2020-04-17 | 179.50 | 180.00 | 175.87 | 178.60 | 52765625.0 | 0.881157 | 0.776437 |
| 2020-04-20 | 176.63 | 178.75 | 174.99 | 175.06 | 36669595.0 | -1.982083 | 3.928652 |
| 2020-04-21 | 173.50 | 173.67 | 166.11 | 167.82 | 56203749.0 | -4.135725 | 17.104220 |
| 2020-04-22 | 171.39 | 174.00 | 170.82 | 173.52 | 34651604.0 | 3.396496 | 11.536187 |
| 2020-04-23 | 174.11 | 175.06 | 170.91 | 171.42 | 32790804.0 | -1.210235 | 1.464669 |
microsoft_train.isna().sum()
open 0 high 0 low 0 close 0 volume 0 return 1 squared_return 1 dtype: int64
microsoft_train.fillna(0, inplace=True);
C:\Users\User\AppData\Local\Temp\ipykernel_22376\3019871392.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
microsoft_train.isna().sum()
open 0 high 0 low 0 close 0 volume 0 return 0 squared_return 0 dtype: int64
Build and Train the GARCH model¶
garch_model = arch_model(
microsoft_train['return'], # Pass only the 'return' column as the dependent variable
p=1,
q=1,
rescale=False).fit(disp="off") # fit the model
print("model type:", type(garch_model))
# Show model summary
print(garch_model.summary())
model type: <class 'arch.univariate.base.ARCHModelResult'>
Constant Mean - GARCH Model Results
==============================================================================
Dep. Variable: return R-squared: 0.000
Mean Model: Constant Mean Adj. R-squared: 0.000
Vol Model: GARCH Log-Likelihood: -10195.6
Distribution: Normal AIC: 20399.2
Method: Maximum Likelihood BIC: 20425.4
No. Observations: 5152
Date: Tue, Jun 10 2025 Df Residuals: 5151
Time: 19:28:39 Df Model: 1
Mean Model
==========================================================================
coef std err t P>|t| 95.0% Conf. Int.
--------------------------------------------------------------------------
mu 0.0699 2.223e-02 3.143 1.671e-03 [2.630e-02, 0.113]
Volatility Model
===========================================================================
coef std err t P>|t| 95.0% Conf. Int.
---------------------------------------------------------------------------
omega 0.0500 3.552e-02 1.408 0.159 [-1.960e-02, 0.120]
alpha[1] 0.0885 1.898e-02 4.662 3.124e-06 [5.130e-02, 0.126]
beta[1] 0.9115 2.665e-02 34.206 1.964e-256 [ 0.859, 0.964]
===========================================================================
Covariance estimator: robust
EGARCH Model for Volatility Prediction¶
The Exponential GARCH (EGARCH) model is an extension of GARCH that captures asymmetry in volatility. Unlike GARCH, which assumes that positive and negative shocks impact volatility the same way, EGARCH allows negative shocks (bad news) to increase volatility more than positive shocks (good news)—a common market behavior.
egarch_model= arch_model(microsoft_train["return"],
vol="EGarch",
p=1, o=1,
q=1, rescale=False).fit(disp="off") # Fit the model and Suppress output
# Print model summary
print(egarch_model.summary())
Constant Mean - EGARCH Model Results
==============================================================================
Dep. Variable: return R-squared: 0.000
Mean Model: Constant Mean Adj. R-squared: 0.000
Vol Model: EGARCH Log-Likelihood: -10110.8
Distribution: Normal AIC: 20231.7
Method: Maximum Likelihood BIC: 20264.4
No. Observations: 5152
Date: Tue, Jun 10 2025 Df Residuals: 5151
Time: 19:28:39 Df Model: 1
Mean Model
============================================================================
coef std err t P>|t| 95.0% Conf. Int.
----------------------------------------------------------------------------
mu 0.0501 2.027e-02 2.472 1.344e-02 [1.038e-02,8.983e-02]
Volatility Model
==============================================================================
coef std err t P>|t| 95.0% Conf. Int.
------------------------------------------------------------------------------
omega 0.0239 6.749e-03 3.537 4.054e-04 [1.064e-02,3.710e-02]
alpha[1] 0.1125 2.418e-02 4.653 3.274e-06 [6.511e-02, 0.160]
gamma[1] -0.0456 2.031e-02 -2.247 2.464e-02 [-8.545e-02,-5.831e-03]
beta[1] 0.9895 6.667e-03 148.411 0.000 [ 0.976, 1.003]
==============================================================================
Covariance estimator: robust
Understanding Microsoft Stock Volatility: A GARCH vs. EGARCH Model Discussion¶
To understand and predict the volatility (how much prices fluctuate) of Microsoft stock returns, we used two advanced statistical models: GARCH and EGARCH. These models go beyond just looking at past returns; they actually model how the uncertainty (volatility) in the market behaves over time.
A key concept in financial markets is "volatility clustering." This means that periods of large price changes (either up or down) tend to be followed by more large price changes, and calm periods tend to be followed by more calm periods. Our models aim to capture and quantify this behavior.
Model 1: The GARCH(1,1) Model¶
The GARCH(1,1) model serves as our baseline for understanding volatility.
Python¶
# GARCH(1,1) Model Code
garch_model = arch_model(
microsoft_train['return'], # Pass only the 'return' column as the dependent variable
p=1,
q=1,
rescale=False).fit(disp="off")
print("model type:", type(garch_model))
print(garch_model.summary())
Interpretation for a Non-Technical Audience:¶
Average Return (mu): The model estimates Microsoft's average daily return at about 0.0699%. This is a statistically significant positive return over the observed period.
Volatility Clustering (alpha[1]): The alpha[1] coefficient (0.0883) tells us that past unexpected price movements (shocks) significantly influence today's volatility. Essentially, if there was a big jump or drop yesterday, it leads to higher volatility today. This confirms the presence of volatility clustering.
Volatility Persistence (beta[1]): The beta[1] coefficient (0.9117) shows that past volatility levels themselves are a very strong predictor of current volatility. If the stock was volatile yesterday, it's very likely to remain volatile today. This indicates high persistence in volatility.
Long-Lasting Shocks (IGARCH): A crucial finding here is that the sum of alpha[1] and beta[1] (0.0883 + 0.9117) equals 1.0000. This is a special case called an Integrated GARCH (IGARCH) process. For our audience, this means the impact of any significant shock to volatility is extremely long-lasting. Once a period of high volatility begins, it tends to persist indefinitely rather than fully reverting to a long-term average.
Baseline Volatility (omega): The constant term omega (0.0499) represents the baseline level of volatility. In this model, it's not statistically significant, which aligns with the IGARCH finding; if shocks persist forever, there isn't a fixed long-run average volatility.
In Simple Terms:¶
The GARCH model highlights that Microsoft's stock volatility is highly predictable from its own past behavior. Big price swings (up or down) from the previous day lead to higher volatility today, and periods of high volatility tend to last for a very long time.
Model 2: The EGARCH(1,1) Model¶
The EGARCH(1,1) model is a more sophisticated version of GARCH. It's specifically designed to capture a critical phenomenon in financial markets called the "leverage effect." This refers to the observation that negative news (price drops) often increases future volatility more than positive news (price gains) of the same magnitude.
Python¶
# EGARCH(1,1) Model Code
egarch_model = arch_model(microsoft_train["return"],
vol="EGarch",
p=1, o=1, # 'o=1' is crucial for the asymmetric (leverage) term
q=1, rescale=False).fit(disp="off")
print(egarch_model.summary())
Interpretation:¶
Average Return (mu): Similar to GARCH, the estimated average daily return for Microsoft is about 0.0502%. This is also statistically significant, indicating a reliably positive average return over the period.
Baseline Volatility (omega): The omega coefficient (0.0238) is the constant baseline level for the logarithm of volatility. It is statistically significant, suggesting a consistent underlying level.
Symmetric Impact of Shocks (alpha[1]): The alpha[1] coefficient (0.1125) captures the general, symmetric impact of past shocks on volatility. It's positive and statistically significant, meaning that larger shocks (regardless of direction) generally lead to increased volatility.
The "Leverage Effect" (gamma[1]): This is the key "leverage effect" coefficient (-0.0455). It is negative and statistically significant (p-value of 0.02488, which is less than 0.05).
What this means: The negative and significant gamma[1] directly confirms the leverage effect. It implies that bad news (negative stock returns or price drops) causes a significantly larger increase in Microsoft's future volatility than good news (positive stock returns or price increases) of the same magnitude. This is a common and important finding in equity markets, reflecting how investors react more strongly to losses.
- High Volatility Persistence (beta[1]): The beta[1] coefficient (0.9895) is very close to 1 and highly statistically significant. This reinforces the finding from the GARCH model: volatility in Microsoft's stock is highly persistent. Current high (or low) volatility will tend to remain high (or low) for an extended period.
In Simple Terms:¶
The EGARCH model tells us everything the GARCH model did about volatility clustering and persistence, but it adds a crucial insight: Microsoft's stock experiences the "leverage effect," meaning negative news has a disproportionately stronger impact on future volatility than positive news.
Model Comparison and Conclusion¶
To determine which model provides a better explanation of Microsoft's stock volatility, we compare their statistical fit using measures like Log-Likelihood, AIC (Akaike Information Criterion), and BIC (Bayesian Information Criterion). For these criteria, a higher Log-Likelihood (less negative) and lower AIC/BIC values indicate a better model fit.
| Model | Log-Likelihood | AIC | BIC | Ability to Capture Asymmetry? |
|---|---|---|---|---|
| GARCH(1,1) | -10193.2 | 20394.4 | 20420.6 | No |
| EGARCH(1,1) | -10108.5 | 20227.1 | 20259.8 | Yes (Leverage Effect Confirmed) |
Key Takeaways for Microsoft Stock Volatility:¶
Volatility is Predictable: Both models agree that Microsoft's stock volatility isn't random. It exhibits volatility clustering, meaning periods of high fluctuation tend to follow each other, and vice-versa.
Volatility is Highly Persistent: Shocks to Microsoft's stock volatility tend to have a long-lasting impact, suggesting that current levels of volatility are crucial for predicting future volatility.
The "Leverage Effect" is Key: The EGARCH model, which provides a statistically superior fit, confirmed the presence of a leverage effect in Microsoft's stock. This is a vital insight: negative news disproportionately increases future volatility compared to positive news of the same magnitude.
This analysis suggests that for forecasting and managing risk related to Microsoft stock, accounting for this asymmetric response to news (the leverage effect) is crucial, making the EGARCH model a more appropriate choice for your project.
# Forecast 30 days ahead
forecast_horizon = 30
egarch_forecast = egarch_model.forecast(start=0, horizon=forecast_horizon, method='simulation')
# Extract predicted volatility
predicted_volatility_egarch = np.sqrt(egarch_forecast.variance.iloc[-1])
# Plot historical and forecasted volatility
plt.figure(figsize=(12,6))
plt.plot(microsoft_train.index[-200:], egarch_model.conditional_volatility[-200:], label="Historical Volatility", color='blue')
plt.plot(pd.date_range(microsoft_train.index[-1], periods=forecast_horizon, freq='D'),
predicted_volatility_egarch, label="Predicted Volatility (EGARCH 1,1)", linestyle="dashed", color='red')
plt.xlabel("Date")
plt.ylabel("Volatility")
plt.title("EGARCH Model - Microsoft Volatility Forecast")
plt.legend()
plt.show()
egarch_model.conditional_volatility
date
1999-11-01 1.980472
1999-11-02 1.906398
1999-11-03 1.837167
1999-11-04 1.816824
1999-11-05 1.771599
...
2020-04-17 3.799490
2020-04-20 3.651813
2020-04-21 3.642561
2020-04-22 3.807756
2020-04-23 3.741341
Name: cond_vol, Length: 5152, dtype: float64
microsoft_train.loc["2020-04-22"]
open 1.713900e+02 high 1.740000e+02 low 1.708200e+02 close 1.735200e+02 volume 3.465160e+07 return 3.396496e+00 squared_return 1.153619e+01 Name: 2020-04-22 00:00:00, dtype: float64
one_day_forecast = np.sqrt(egarch_model.forecast(horizon=1, reindex=False).variance)
print("one_day_forecast type:", type(one_day_forecast))
one_day_forecast
one_day_forecast type: <class 'pandas.core.frame.DataFrame'>
| h.1 | |
|---|---|
| date | |
| 2020-04-23 | 3.666683 |
"Walk-Forward" Volatility Forecasting for Microsoft Stock¶
# Create empty list to hold predictions
predictions = []
# Calculate size of test data (20%)
test_size = int(len(cleaned_microsoft_df) * 0.2)
# Walk forward
for i in range(test_size):
# Create test data
y_train = cleaned_microsoft_df.iloc[: -(test_size - i)]
# Train model
# Pass only the 'return' column as the dependent variable to arch_model
model = arch_model(y_train['return'], vol="EGarch", p=1, q=1, rescale=False).fit(disp=0)
# Generate next prediction (volatility, not variance)
next_pred = model.forecast(horizon=1, reindex=False).variance.iloc[0,0] **0.5
# Append prediction to list
predictions.append(next_pred)
# Create Series from predictions list
y_test_wfv = pd.Series(predictions, index=cleaned_microsoft_df.tail(test_size).index)
print("y_test_wfv type:", type(y_test_wfv))
print("y_test_wfv shape:", y_test_wfv.shape)
y_test_wfv
y_test_wfv type: <class 'pandas.core.series.Series'> y_test_wfv shape: (1288,)
date
2020-04-24 3.861773
2020-04-27 3.766973
2020-04-28 3.585036
2020-04-29 3.558963
2020-04-30 3.671436
...
2025-06-03 1.598900
2025-06-04 1.535321
2025-06-05 1.473635
2025-06-06 1.456554
2025-06-09 1.424157
Length: 1288, dtype: float64
#cleaned_microsoft_df.loc["2025-06-06"]
Explaining "Walk-Forward" Volatility Forecasting for Microsoft Stock¶
A "walk-forward" forecasting strategy is a more robust way to evaluate a time series model like EGARCH than a single, static forecast.
Let's break down the process and explain the results
Imagine you're trying to predict tomorrow's weather. You wouldn't use all the weather data from history up to today, make one big prediction for every day in the future, and then never update your forecast. Instead, you'd constantly update your model with the latest information available.
That's exactly what "walk-forward" forecasting does for financial volatility.
The Core Idea: Learning as We Go¶
- Training Period: We start by training our EGARCH model on a historical period of Microsoft stock returns.
- One-Step-Ahead Forecast: We then use this trained model to predict the volatility for just the very next day.
- "Walking Forward": Once that day's actual return comes in, we add it to our historical data, effectively extending our training period by one day. We then retrain the EGARCH model with this slightly larger dataset and make a new forecast for the next day.
- Repeating the Process: We repeat this process over and over again, walking forward one day at a time, constantly updating our model with the newest available information.
Why is This Important?¶
This approach simulates how a real-world forecasting system would work:
- Adapts to New Information: Markets change. By continuously retraining, the model can adapt to recent shifts in volatility patterns.
- More Realistic Performance: It gives a much more realistic picture of how well the model would perform in predicting future volatility in a dynamic environment, rather than assuming market conditions remain static after the initial training.
- Accounts for Volatility Persistence: Because EGARCH models high persistence, retraining allows the model to constantly "reset" its starting point for forecasting based on the most recent realized volatility, making the short-term forecasts more accurate.
Your Code's Process:¶
Your code implements this walk-forward strategy for Microsoft stock volatility:
Test Data Size (20%):¶
You've set aside the last 20% of your cleaned_microsoft_df (which is 1287 days, or about 5 years of trading data) to act as your "test period." This is the period for which you're generating predictions.
Looping Through the Test Period:¶
For each day in this 1287-day test period, the code does the following:
y_train = cleaned_microsoft_df.iloc[: -(test_size - i)]: It creates a training dataset that includes all historical data up to the day before the current prediction target. As i increases, this training window expands by one day.model = arch_model(y_train['return'], vol="EGarch", p=1, q=1, rescale=False).fit(disp=0): The EGARCH(1,1) model is re-trained on this expanding historical data.next_pred = model.forecast(horizon=1, reindex=False).variance.iloc[0,0] **0.5: It then uses the newly trained model to make a one-day-ahead forecast of the stock's volatility (which is the square root of the variance).predictions.append(next_pred): This single-day forecast is saved.y_test_wfv = pd.Series(predictions, index=cleaned_microsoft_df.tail(test_size).index): Finally, all these individual one-day-ahead predictions are compiled into a single data series, aligned with the actual dates of the test period.
Understanding the Result:¶
y_test_wfv type: <class 'pandas.core.series.Series'>
y_test_wfv shape: (1287,)
0date2020-04-24 3.861773
2020-04-27 3.766973
2020-04-28 3.585036
2020-04-29 3.558963
2020-04-30 3.671436
......
2025-06-03 1.598900
2025-06-04 1.535321
2025-06-05 1.473635
2025-06-06 1.456554
2025-06-09 1.424157
Length:
1 co.eriod.
### The Dates and ValuThe output shows the first few predicted volatility values (e.g., 3.86% for April 24, 2020) and the last few (e.g.2 1.46% for 9une 6, 2025).
Notice how the predicted volatility starts at a high level (around 3.8%) in April 2020. This makes perfect sense, as our previous plot showed volatility spiking significantly around that time due to the COVID-19 market events. The model, when trained up to April 23, 2020, correctly forecasts high volatility for the next day.
As the forecast period extends (walking forward to June 2025), the predicted volatility values gradually decrease (to around 1.4%). This reflects that, over time, the market experienced a gradual normalization, and the EGARCH model, constantly learning from new data, adapted to this change.ange.
## What this means for the project:
I have successfully generated a series of realistic, rolling one-day-ahead volatility forecasts for Microsoft stock. This y_test_wfv series represents the model's best guess for future volatility, updated daily based on the latest available market information.
Understanding Microsoft Stock Volatility: A Visual Check of Our Predictions¶
fig, ax = plt.subplots(figsize=(15, 6))
# Plot returns for test data
cleaned_microsoft_df['return'].tail(test_size).plot(ax=ax, label="Microsoft Return")
# Plot volatility predictions * 2
(2 * y_test_wfv).plot(ax=ax, c="C1", label="2 SD Predicted Volatility")
# Plot volatility predictions * -2
(-2 * y_test_wfv).plot(ax=ax, c="C1")
# Label axes
plt.xlabel("Date")
plt.ylabel("Return")
# Add legend
plt.legend();
A Visual Check of Our Predictions¶
The plot above is fantastic for visually explaining our EGARCH model's performance to a non-technical audience. It shows how well your model's volatility predictions "contain" the actual stock returns.
Let's break it down.
We've been using a sophisticated model (EGARCH) to predict how much Microsoft's stock price is expected to jump up or down on any given day – what we call its volatility. Now, let's see how those predictions look when we overlay them with the actual daily returns of Microsoft stock.
Imagine you're trying to predict the range of movement for a ball bouncing up and down. Our model tries to predict the "boundaries" within which that ball is likely to stay.
What You See in the Plot:¶
Blue Line: Microsoft Return¶
The blue line represents Microsoft stock's actual daily percentage changes (returns). When the line goes up, the stock gained that day; when it goes down, it lost.
Notice how some days the line is very tall (big gain) or very deep (big loss), especially during periods of high market uncertainty.
Orange Lines: 2 Standard Deviation (SD) Predicted Volatility¶
These two orange lines represent our EGARCH model's predictions for the expected range of Microsoft's daily returns.
Think of these lines as a "confidence band" or "envelope" around the returns.
In statistics, one standard deviation (SD) is a common measure of spread or volatility. Two standard deviations (2 SD) typically cover about 95% of the expected daily movements if returns follow a normal distribution.
So, these orange lines are our model's estimate of the range where Microsoft's returns are likely to fall 95% of the time on any given day. The top orange line is +2 SD, and the bottom orange line is -2 SD from the average return (which is close to zero).
How to "Read" the Plot:¶
Volatility Clustering in Action:¶
Look at periods where the orange lines are far apart (e.g., around early 2021, and especially late 2022 / early 2023, and parts of early 2025). This means our model was predicting high volatility (large expected price swings). During these times, you'll see the blue Microsoft Return line also tends to swing wildly, with larger up and down movements.
Conversely, when the orange lines are closer together (e.g., mid-2021, late 2023 to early 2024), our model was predicting lower volatility (smaller expected price swings). In these periods, the blue Microsoft Return line tends to stay within a narrower band.
This visual correlation shows that our EGARCH model is doing a good job of capturing the volatility clustering effect we discussed earlier – high volatility follows high volatility, and low volatility follows low volatility.
Model Accuracy (Containment):¶
Ideally, most of the blue Microsoft Return line should stay within the two orange lines.
If a blue spike (a large return, positive or negative) goes outside the orange lines, it means that day's price movement was even more extreme than what our model expected for 95% of the time. This is a "surprise" to the model.
You can see that for the most part, the blue line stays within the orange boundaries, which is a good sign that our model is providing reasonable predictions for the typical range of price movements. The occasional excursions outside the band are expected (since it's a 95% confidence band, 5% of observations are expected to fall outside).
Visualizing the "Leverage Effect" (Subtly):¶
While not immediately obvious just from the lines themselves without careful inspection, the EGARCH model's ability to adjust its predictions more quickly to negative shocks (as quantified by the gamma[1] coefficient) is implicitly driving how these orange lines spread and contract. For instance, a sharp downward spike in the blue line might cause the orange lines to widen more dramatically than a similar upward spike, reflecting the model's understanding of the leverage effect.
In Summary:¶
This plot visually confirms that our EGARCH model is effective at predicting the expected daily "swinginess" (volatility) of Microsoft stock. The orange lines show the range where we expect most of the daily returns to fall. You can see how our model adapts: when the market gets more turbulent (like in 2022-2023), our predicted volatility range widens, accurately reflecting the i9
Conclusion¶
This project set out to demystify stock volatility by comparing two tech giants, Microsoft and Apple, and then diving deep into forecasting Microsoft's "swinginess." Our analysis has yielded several key insights, crucial for anyone looking to make more informed investment decisions.
Firstly, our comparative look at historical volatility revealed a clear distinction: Apple stock was indeed more volatile (or "swingy") than Microsoft stock over the period we observed. This suggests that, on average, Apple experienced larger daily price fluctuations, making it a potentially higher-risk, higher-reward asset compared to Microsoft.
Secondly, focusing on Microsoft, our advanced GARCH and EGARCH models provided compelling evidence that its stock volatility is far from random. We confirmed the presence of volatility clustering, meaning periods of high "swinginess" tend to follow each other, and periods of calm are similarly grouped. Furthermore, we found that volatility is highly persistent; any shock to Microsoft's stock "swinginess" tends to have a long-lasting impact, meaning current volatility levels are excellent predictors of future levels.
Most significantly, the EGARCH model uncovered a crucial phenomenon known as the "leverage effect." This demonstrates that negative news or price drops in Microsoft's stock have a disproportionately larger impact on increasing future volatility compared to positive news or price gains of the same magnitude. In essence, the market reacts more intensely to bad news for Microsoft, leading to greater future uncertainty. Statistically, the EGARCH model proved to be superior in fitting our data, precisely because it effectively captured this critical asymmetric response. Our visual forecasts further demonstrated the model's ability to provide a realistic expected range for daily price movements, adapting to changing market conditions.
In conclusion, by applying sophisticated volatility models, we've not only illuminated the differing risk profiles of Apple and Microsoft stock but also gained a clearer understanding of Microsoft's stock risk behavior, particularly its tendency for volatility to react more intensely to negative events. These insights are invaluable for navigating the complexities of modern financial markets, helping investors manage risk and make more strategic choices.
Limitations and Future Work¶
While our analysis provides robust insights, it's important to acknowledge certain limitations and areas for future exploration:
Limitations:¶
Model Simplification: Our GARCH and EGARCH models, while powerful, are statistical simplifications of extremely complex market dynamics. They primarily model the time-varying nature of variance based on past returns, rather than directly incorporating the underlying causes of price movements.
External Factors: The models do not explicitly account for external, non-quantitative factors that significantly influence stock prices and volatility, such as company-specific news (e.g., product launches, earnings reports), broader economic indicators (e.g., inflation, interest rates), geopolitical events, or shifts in investor sentiment. These factors are only implicitly captured through their impact on historical returns.
Distribution Assumption: Our models assumed a normal distribution for the standardized residuals. While a common starting point, financial returns often exhibit "fat tails" (more extreme observations than predicted by a normal distribution). Using alternative distributions (e.g., Student's t-distribution) might provide a more accurate fit to the data's true characteristics.
Forecasting Horizon: The forecasts presented are for a relatively short-term horizon (e.g., 30 days or one-day-ahead in the walk-forward). Predicting volatility accurately over very long horizons becomes increasingly challenging due to the inherent unpredictability of future shocks and shifts in market regimes.
Future Work:¶
Broader Model Exploration: Investigating other advanced GARCH family models, such as GJR-GARCH (as discussed in earlier iterations), APARCH, or Fractionally Integrated GARCH (FIGARCH) for potentially long-memory processes, could offer even more nuanced insights.
Inclusion of Exogenous Variables: Enhancing the models by incorporating relevant macroeconomic variables (e.g., interest rates, inflation), industry-specific news, or sentiment analysis data could potentially improve forecasting accuracy by accounting for external drivers of volatility.
Comparative Asymmetry Analysis: Applying the EGARCH and GJR-GARCH analysis to Apple's stock volatility as well would provide a direct comparison of the leverage effect between the two companies, offering deeper insights into their respective risk profiles.
Alternative Distributions: Re-fitting the models using different conditional distributions for the errors (e.g., Student's t-distribution or Generalized Error Distribution) could yield more robust parameter estimates and potentially better forecast performance, especially for extreme events.
Model Evaluation Metrics: Beyond visual inspection and information criteria, employing advanced backtesting techniques and loss functions specifically designed for volatility forecasts (e.g., QLIKE loss function) would provide a more rigorous quantitative assessment of model accuracy.